NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Global Minimizers of ℓp-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Nakhleh, J; Nowak, R (December 2025, The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025))

Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of finding the sparsest interpolating ReLU network—i.e., the network with the fewest nonzero parameters or neurons—a goal with wide-ranging implications for efficiency, generalization, interpretability, theory, and model compression. Unlike post hoc pruning approaches, we propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to the sparsest single-hidden-layer ReLU networks that fit the data. This result marks a conceptual advance: it recasts the combinatorial problem of sparse interpolation as a smooth optimization task, potentially enabling the use of gradient-based training methods. Our objective is based on minimizing ℓp quasinorms of the weights for 0 < p < 1, a classical sparsity-promoting strategy in finite-dimensional settings. However, applying these ideas to neural networks presents new challenges: the function class is infinite-dimensional, and the weights are learned using a highly nonconvex objective. We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions. Our work lays a foundation for understanding when and how continuous sparsity-inducing objectives can be leveraged to recover sparse networks through training.
more » « less
Free, publicly-accessible full text available December 3, 2026
Task vectors in in-context learning: Emergence, formation, and benefit

Yang, L; Ziqian, L; Lee, K; Papailliopoulos, D; Nowak, R (October 2025, Conference on Language Modeling)

In-context learning is a remarkable capability of transformers, referring to their ability to adapt to specific tasks based on a short history or context. Previous research has found that task-specific information is locally encoded within models, though their emergence and functionality remain unclear due to opaque pre-training processes. In this work, we investigate the formation of task vectors in a controlled setting, using models trained from scratch on synthetic datasets. Our findings confirm that task vectors naturally emerge under certain conditions, but the tasks may be relatively weakly and/or non-locally encoded within the model. To promote strong task vectors encoded at a prescribed location within the model, we propose an auxiliary training mechanism based on a task vector prompting loss (TVP-loss). This method eliminates the need to search for task-correlated encodings within the trained model and demonstrably improves robustness and generalization.
more » « less
Free, publicly-accessible full text available October 7, 2026
A Fully first-order method for stochastic bilevel optimization

Kwon, J; Kwon, D; Wright, SJ; Nowak, R (June 2023, ICML)

Full Text Available
Bilinear Bandits with Low-Rank Structure

Jun, K-S; Willett, R; Nowak, R; Wright, S (January 2019, Proceedings of Machine Learning Research)

Full Text Available

Search for: All records